Skip to content

Conversation

CoryMartin-NOAA
Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA commented Oct 10, 2025

Description

This PR will add the capability to convert GSI netCDF diagnostic files to IODA-compliant netCDF files, which will also facilitate computing summary stats from GSI leveraging new, IODA-based tools.

Resolves #4145

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO

  • Does this change require a documentation update? NO

  • Does this change require an update to any of the following submodules? YES

How has this been tested?

Example:

  • Clone and build on Ursa
  • Cycled test on Ursa

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@CoryMartin-NOAA
Copy link
Contributor Author

This will produce, for example in:
COMROOT/C96_atm3DVar/gdas.20211221/00/products/atmos/anlmon

gdas.t00z.atmos_gsi_ioda_diags.tgz  gdas.t00z.atmos_gsi_iodastat.tgz  sfc_ps_2021122100_output_atmos_gsi.nc

Where
gdas.t00z.atmos_gsi_ioda_diags.tgz contains numerous IODA files converted from GSI ncdiags:

scatwind_obs_2021122100.nc4.gz
abi_g17_obs_2021122100.nc4.gz
seviri_m11_obs_2021122100.nc4.gz
seviri_m08_obs_2021122100.nc4.gz
sfc_ps_obs_2021122100.nc4.gz
sfc_uv_obs_2021122100.nc4.gz
sfcship_uv_obs_2021122100.nc4.gz
sondes_q_obs_2021122100.nc4.gz
avhrr_n18_obs_2021122100.nc4.gz
sfc_q_obs_2021122100.nc4.gz
ompstc8_npp_obs_2021122100.nc4.gz
aircraft_tsen_obs_2021122100.nc4.gz
aircraft_q_obs_2021122100.nc4.gz
sfcship_tv_obs_2021122100.nc4.gz
sondes_ps_obs_2021122100.nc4.gz
atms_npp_obs_2021122100.nc4.gz

gdas.t00z.atmos_gsi_iodastat.tgz currently only contains sfc_ps_2021122100_output_atmos_gsi.nc

@CoryMartin-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA @ADCollard one thing here is that the GSI diags are separate files for _ges and _anl, and I don't know which one ends up getting saved (depends on what is processed last). Do we need to find a way to keep both GsiHofXBc from ges and anl, can we just use ges? Thoughts?

@RussTreadon-NOAA
Copy link
Contributor

@RussTreadon-NOAA @ADCollard one thing here is that the GSI diags are separate files for _ges and _anl, and I don't know which one ends up getting saved (depends on what is processed last). Do we need to find a way to keep both GsiHofXBc from ges and anl, can we just use ges? Thoughts?

@CoryMartin-NOAA. I checked a JEDI atmstat tarball from the prjedi. The observation specific netcdf files contain hofx0 and hofx1.

The prjedi atmvar.yaml has

variational:
  minimizer:
    algorithm: DRPCG
  iterations:
  - ninner: 50
    gradient norm reduction: 1e-10
    test: true
    geometry:
      fms initialization:
        namelist filename: ./fv3jedi/fmsmpp.nml
        field table filename: ./fv3jedi/field_table
      akbk: ./fv3jedi/akbk.nc4
      layout:
      - 8
      - 8
      npx: 193
      npy: 193
      npz: 127
      vert coordinate: logp
    diagnostics:
      departures: bkgmob
final:
  diagnostics:
    departures: anlmob

It's my impression that bkgmob creates hofx0. This is the ges H(x) since we only have a single outer loop. The anlmob creates hofx1. This is the anl H(x). Seems we have both the ges and anl H(x) is the JEDI diagnostic file. Not sure I'm answering your question.

@CoryMartin-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA no I'm talking about the GSI ncdiags, they have Obs_Minus_Forecast_adjusted which gets 'translated' into GsiHofXBc but there are 2 files, one _ges and one _anl. Which (both?) should we save when we are converting the diags to IODA format?

@RussTreadon-NOAA
Copy link
Contributor

My two cents: We should save both the guess and analysis bias corrected innovations. Both sets of innovations are plottable from the current RadMon time series page.

@CoryMartin-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA that was my thinking too

@ADCollard
Copy link
Contributor

ADCollard commented Oct 14, 2025

@RussTreadon-NOAA that was my thinking too

@CoryMartin-NOAA @RussTreadon-NOAA Yes, we need both. I think @RussTreadon-NOAA was pointing out that we keep the same info in hofx0 and hofx2. If you want to keep the format consistent.

@CoryMartin-NOAA
Copy link
Contributor Author

I'm hesitant to keep the format consistent as it will make it easier for us to know what came from where (GSI vs JEDI) but I'm open to making subsequent changes to the converter scripts (in the DA-Utils PR) to make things more consistent

@CoryMartin-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA @ADCollard now I have it so that the files produced are like so:

netcdf sfc_ps_gsi_2021122100 {
dimensions:
        Location = UNLIMITED ; // (75151 currently)
        Channel = 3 ;
variables:
        int64 Location(Location) ;
                Location:suggested_chunk_dim = 10000LL ;
        int Channel(Channel) ;
                Channel:suggested_chunk_dim = 3LL ;

// global attributes:
                :_ioda_layout = "ObsGroup" ;
                :_ioda_layout_version = 0 ;
                :converter = "gsi_ncdiag.py" ;

group: GsiAdjustObsErrorGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiAdjustObsErrorGes

group: GsiEffectiveQCGes {
  variables:
        int stationPressure(Location) ;
  } // group GsiEffectiveQCGes

group: GsiFinalObsErrorGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiFinalObsErrorGes

group: GsiHofXBcGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXBcGes
...
group: ObsValue {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
                stationPressure:coordinates = "longitude latitude" ;
  } // group ObsValue
...

group: GsiAdjustObsErrorAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiAdjustObsErrorAnl

group: GsiEffectiveQCAnl {
  variables:
        int stationPressure(Location) ;
  } // group GsiEffectiveQCAnl

group: GsiFinalObsErrorAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiFinalObsErrorAnl

group: GsiHofXAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXAnl

group: GsiHofXBcAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXBcAnl

where the groups have Ges and Anl suffixes in the same file

@ADCollard
Copy link
Contributor

@RussTreadon-NOAA @ADCollard now I have it so that the files produced are like so:

netcdf sfc_ps_gsi_2021122100 {
dimensions:
        Location = UNLIMITED ; // (75151 currently)
        Channel = 3 ;
variables:
        int64 Location(Location) ;
                Location:suggested_chunk_dim = 10000LL ;
        int Channel(Channel) ;
                Channel:suggested_chunk_dim = 3LL ;

// global attributes:
                :_ioda_layout = "ObsGroup" ;
                :_ioda_layout_version = 0 ;
                :converter = "gsi_ncdiag.py" ;

group: GsiAdjustObsErrorGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiAdjustObsErrorGes

group: GsiEffectiveQCGes {
  variables:
        int stationPressure(Location) ;
  } // group GsiEffectiveQCGes

group: GsiFinalObsErrorGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiFinalObsErrorGes

group: GsiHofXBcGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXBcGes
...
group: ObsValue {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
                stationPressure:coordinates = "longitude latitude" ;
  } // group ObsValue
...

group: GsiAdjustObsErrorAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiAdjustObsErrorAnl

group: GsiEffectiveQCAnl {
  variables:
        int stationPressure(Location) ;
  } // group GsiEffectiveQCAnl

group: GsiFinalObsErrorAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiFinalObsErrorAnl

group: GsiHofXAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXAnl

group: GsiHofXBcAnl {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXBcAnl

where the groups have Ges and Anl suffixes in the same file

Nice! But don't you need to add GsiHofXGes?

@CoryMartin-NOAA
Copy link
Contributor Author

@ADCollard it's there I just didn't copy the entire ncdump

group: GsiHofXBcGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXBcGes

group: GsiHofXGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiHofXGes

group: GsiInputObsErrorGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiInputObsErrorGes

group: GsiQCWeightGes {
  variables:
        float stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiQCWeightGes

group: GsiUseFlagGes {
  variables:
        int stationPressure(Location) ;
                stationPressure:units = "Pa" ;
  } // group GsiUseFlagGes

group: MetaData {
  variables:
        int64 dateTime(Location) ;
                dateTime:units = "seconds since 1970-01-01T00:00:00Z" ;
        float height(Location) ;
                height:units = "m" ;
        float latitude(Location) ;
                latitude:units = "degree_north" ;
        float longitude(Location) ;
                longitude:units = "degree_east" ;
        float pressure(Location) ;
                pressure:units = "Pa" ;
        int satelliteIdentifier(Location) ;
        int sequenceNumber(Location) ;
        float stationElevation(Location) ;
                stationElevation:units = "m" ;
        string stationIdentification(Location) ;
  } // group MetaData

Copy link
Contributor

@ADCollard ADCollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Convert GSI netCDF diag files to JEDI IODA format in the workflow

3 participants